Update in force.ha value configuration value obtention#13391
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## 4.20 #13391 +/- ##
=========================================
Coverage 16.26% 16.26%
- Complexity 13434 13435 +1
=========================================
Files 5666 5666
Lines 500645 500646 +1
Branches 60801 60801
=========================================
+ Hits 81426 81428 +2
+ Misses 410112 410109 -3
- Partials 9107 9109 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@blueorangutan package |
|
@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
winterhazel
left a comment
There was a problem hiding this comment.
Looks good, did not test.
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18219 |
|
@blueorangutan test |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
[SF] Trillian Build Failed (tid-16327) |
Description
Currently, workflows that use the
force.haconfiguration value only consider the global value of the configuration, even though it has aClusterscope. To fix this, changes were made to consider the configuration's cluster scope value. If it is not configured at cluster level, the global value is used.Sometimes, the cluster ID was not available at the method for the configuration value obtention, and the host could be
null. Thus, thefindClusterAndHostIdForVMwas used, as it searches for the VM's host/last host and the returned host's cluster. If they (host and cluster) are both null, the cluster ID is obtained from the storage where the VM volume is allocated is returned.This PR also removes the host's cluster cleanup that was executed during host's removal. It was observed that there was no impact in removing the host' cluster ID, and processes that could be affected by it already contained validations to prevent errors.
Types of changes
Feature/Enhancement Scale or Bug Severity
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
First, I validated that without the changes, VMs without HA enabled in their offerings were not restarted, even though the
force.haconfiguration was enabled at cluster scope. After installing the packages with the new changes, I made the following tests. All the tests were executed one time with the configuration enabled globally, and one time with it enabled at cluster scope:kill -9 <pid>Disconnected, and the VM was restarted in another environment node.Regarding the cluster ID cleanup removal, the following tests were conducted to check whether keeping it would cause inconsistencies:
Before the cluster ID cleanup removal, during the host's forced removal, HA restart jobs are created for the workers (
ha.workersconfiguration) to process. However, if there aren't enough workers available to process the jobs and theforce.haconfiguration is enabled only at cluster scope, it is possible that the host's removal flow finishes before processing all HA jobs, leading to inconsistent VMs.In order to validate if this was fixed after the cleanup removal, in a environment with 2 hosts, I provisioned 35 VMs in one host and set the
ha.workersamount to1. Then, I forcefully removed the host where the VMs were provisioned, and validated that all the VMs were stopped, and restarted on the other host.